Efficient Bayesian Inference for Learning in the Ising Linear 
Perceptron and Signal Detection in CDMA 

Juan P. Neirotti and David Saad 
The Neural Computing Research Group, 
Aston University, Birmingham B4 7ET, UK. 

Abstract 

Efficient new Bayesian inference technique is employed for studying critical properties of the Ising 
linear perceptron and for signal detection in Code Division Multiple Access (CDMA). The approach 
is based on a recently introduced message passing technique for densely connected systems. Here 
we study both critical and non-critical regimes. Results obtained in the non-critical regime give 
rise to a highly efficient signal detection algorithm in the context of CDMA; while in the critical 
regime one observes a first order transition line that ends in a continuous phase transition point. 
Finite size effects are also studied. 

PACS numbers: 89.70. +c, 75.10.Nr, 64.60.Cn 
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I. INTRODUCTION 



Efficient inference in large complex systems is a major challenge with significant impli- 
cations in science, engineering and computing. Exact inference is computationally hard in 
complex systems and a range of approximation methods have been devised over the years, 
many of which have been originated in the physics literature. A recent review [lj highlights 
the links between the various approximation methods and their applications. 

In the current paper, we extend a method that was introduced only recently Q] for 
inference in dense graphs using message passing techniques. The method has been employed 
previously only in the non-critical regime Q , and is used here for studying both critical and 
non-critical regimes. We apply the method to two different but related problems: signal 
detection in CDMA and learning in the Ising linear perceptron (ILP). 

Multiple access communication refers to the transmission of multiple messages to a single 
receiver. The scenario we study here is that of K users transmitting independent messages 
over an additive white Gaussian noise channel of zero mean and variance Cq. In the scenario 
of a Code Division Multiple Access (CDMA) system j^, the signal from each user is modu- 
lated by a randomly chosen spreading code of length N; these signals are added up and sent 
through a noisy channel to the receiving station, which extracts the original message from 
the received signal using knowledge of the user's spreading codes. 

We consider the large- system limit, in which the number of users K tends to infinity 
while the system load f3 = K/N ~ 0(1). We focus on a CDMA system using binary phase 
shift keying symbols and will assume the power is completely controlled to unit energy. The 
received aggregated, modulated and corrupted signal is then of the form: 



where bk is the bit transmitted by user k, s^k the spreading chip value, n M the Gaussian 
noise variable drawn from M (0, 1), and the received message. This process is reminiscent 
of the learning task performed by a perceptron with binary weights and linear output. 

The perceptron is a network which sums a single layer of inputs s^, each weighed by 
a corresponding synaptic weight bj] the cumulative contribution is an argument of some 
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transfer function g(-) that gives rise to the output y^ 
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A normalisation factor K~ 1 / 2 is included to make the argument of the transfer function 
of (9(1). If the entries of b are ±1 the perceptron is termed Ising perceptron. When the 
transfer function is the identity, the perceptron is referred to as linear 

The similarity between the linear perceptron of Eq. (J2J) and the CDMA detection problem 
of Eq. allows for a direct relation between the two problems to be established. The 
main difference between the problems is the regime of interest. While CDMA detection 
applications are of interest mainly for non-critical low load values, ILP studies focused on 
the critical regime. We consider both regimes in this paper, but to unify the treatment we 
will use the notation and scaling conventions of the CDMA system. 

II. MESSAGE PASSING 

Graphical models (Bayes belief networks) provide a powerful framework for modelling 
statistical dependencies between variables [a, Q, B|- They play an essential role in devising 
a principled probabilistic framework for inference in a broad range of applications. 

Message passing techniques are guaranteed to converge to the globally correct estimate 
in graphical models that can be represented by a sparse graph with a few (typically long) 
loops. There are no such guarantees for systems with loops even in the case of large loops 
and a local tree-like structure (although see M). A clear link has been established between 
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certain message passing algorithms and methods of statistical mechanics |1L llfll 

In a recent development, we presented a new approach |3j, which was inspired by both, 
the extension of Belief Propagation (BP) to tackle densely connected graphs 2| and that of 
the replica-symmetric-equivalent BP to Survey Propagation (SP) 

The systems we consider here are characterised by multiplicity of pure states and a pos- 
sible fragmentation of the space of solutions. To address the inference problem in such cases 
we consider an ensemble of replicated systems where averages are taken over the ensemble 
of potential solutions. This amounts to the presentation of a new graph, where the observ- 
ables y^ are linked to variables in all replicated systems, namely B = (b 1 , b 2 , . . . , b n ); where 
b a = (b\, • • • i ^k) T ■ To estimate the parameters B given the data y = (yx,y2, • • • , 2/at) T , in 



a Bayesian framework, we have to maximise the posterior P (B|y) oc U^=i P (^|B) P (B) , 
where we have considered independent data, and thus P (y|B) = n^=i P B) . 

The likelihood so defined is of a general form; the explicit expression depends on the 
particular problem studied. Here, we are interested in cases where bG {±1}^ is an unbiased 
vector and P (B) = 2~ Kn . The estimate we would like to obtain is the maximiser of the 
posterior marginal (MPM) = argmax b g r±\n Tr r i-f (B|y) , which is expected to be 
a vector with equal entries for all replica b\ = b\ = ■ ■ ■ = b k . The number of operations 
required to obtain the full MPM estimator is of O which is infeasible for large K values. 

For calculating the posterior in the case of both CDMA and the ILP, we use the explicit 
dependency of y^ on h k from Eqs. (JTJ) and (J2J) = Z~2iLi + an fi ? where a is a free 
parameter of the model, to be optimised later in the process; it reflects our ignorance of 
the true noise parameter ctq- The variable n M is drawn from A/"(0, 1) and the e^i are small 
enough to ensure that X^=i £^1^0(1). For facilitating the derivation that follows we also 
define the variable A M = Ya=i = Y^i^k + e ^k = + e^b* , representing the 
uncorrupted signal. Subsequently, the likelihood can be expanded to take the form 



P(Vn\B) oc ]Jexp 
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Using Bayes rule one obtains the BP equations 



P t+1 (y,\b kl {y u ^}) = J r P{y,\B)\{P t {h l \{y v ^}) 

{bi^ fc } l^k 
P'^iliVu^}) cc [[P'^lbi,^}) . 



(4) 
(5) 



An explicit expression for the inter-dependency between solutions is required for obtaining 
a closed set of update equations. We assume a dependence of the form P l (b/-| {y u ^^}) oc 
exp {h^ b k + |bjQ^ fc b fc } , where h* fe is a vector representing an external field and Q^ k the 
matrix of cross-replica interaction. We expect the free energy obtained from the well behaved 



distribution P* to be self- averaging, thus we assume the following symmetry between replica 
(Qufe) ab = (l-5 ab ) gijn and (h* ) a = hl k , where both hl k and g* are of 0(1). An 
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expression for P* immediately follows 
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P* (b*| W,}) = — ^-too S — (6) 

/ dx exp (x; h^, g^ k ) } 

J — oo 

where $ (x; ft^ fe , <7* fc ) = (rr — h^ k ) 2 /2g^ k — In (2 cosh(:r)) . We exploit the assumption that the 
number of replica n is large and employ Laplace's method to find dominant contributions to 
the integral. The function (x; h* k , g* k ) exhibits two minima if h* k — > and g 1 ^ > 1; these 
will provide the only contributions in that limit. Other regimes will provide trivial solutions. 
If the field goes to zero as m^,/^ ~ln ^4n (^ fe ) 2 j /2n , where m^ fc is the spontaneous 
magnetisation and n^ k a constant, the first two moments of b k are, up to O (n^ 1 ), 

W*) : 



<&£> = TrP'(b fc |W})6 a fc ~ 

{b fc } 

w> - (bD m * s kl L ab [i - k,) 2 ] + (i - ^) 
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However, for calculating the posterior we need the distribution on the variable A^ k , 
which is a sum of a large number of unbiased and uncorrelated random variables and 
b k . Therefore, by virtue of the central limit theorem, the variable A^ k = Ylii^k £ v$i obeys a 
normal distribution, whose mean value and covariance matrix are given by 

( u y a = (A* k ) = T r jjp* (b,i w,})^^^ = ]>>K* ( 7 ) 

(ry ab ee <^^> - <z\ a fc > {a\) = J r U pt ( b ^l ^» E - (e w%) 
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= E4 ( W> - (bf) = 5 ab - Qy - (1 - 5 ab ) -P* fc , (8) 

l+k n 

where X^ k = £^ fc eJ„ Q\k = £j#* (^i m UY and ^ = (^^ m W 2 are macro- 

scopic variables of 0(1). In particular, P^ fc is a free variable that can be used to opti- 
mise with respect to a given performance measure. This corresponds to fine tuning of 
the variational model considered. All three quantities X^ k , Q^ k and R^ k are self averag- 
ing so we can drop both indices /i and k. The probability of A^ k can be expressed as 

p(A, k ) oc exp {-! {A, k - u y T (ry- 1 (A M , - u< fc ) } . 



Having the probability distribution of we can express the message from nodes to 
nodes b k at time t + 1 explicitly, using Eqs. (j3J), (jlj) and © 
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It is then straightforward to prove the following equation and its approximation, the 
latter due to the fact that m^ fc ~ O (e„k) 
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tanh | 2_. arctanh J ~ tanh | /]ml k J . (10) 

To study the quality of the inferred vectors one considers the gauged field with respect to 
the true message b^h 1 ^ where h^ k = artanh (jn^j = artanh (ra* fe ) ~ The 

distribution of this field is likely to be well approximated by a Gaussian, as a result of the 
central limit theorem, whose mean and variance are E 1 and F l respectively 

K N K N 

k=l fj,=l k=l /j,=1 

both are assumed to be independent of the index /i due to self- averaging. For the same 
reason we expect the macroscopic variables, representing the overlap between the vectors 
m M and b at any time t and the squared length of m^, defined as = Yl!k=i ^kfn^ k /K ~ 
Etihmi/K = M* and iV< ee £f =1 {m^ k ) 2 / K ~ £*i Kf /K = N*, to be /i inde- 
pendent. Using the distribution we obtained for the gauged field bkh 1 ^, both variables can 
be evaluated by M l = J Vu tanh (^Eu + , N* = J Vu tanh 2 (^/Eu + eA , where 
Vu = exp (— u 2 /2) /V2tt. Applying a method equivalent to the EM algorithm [l3] for the 
independent parameter of the model a 2 we have that the optimal selection of the parameter 
is given by the condition E 1 = F f , which also implies that N l = M*. Notice that this result 
is not surprising as it maximises the normalised overlap between the vectors and b. 

It is important to notice at this point the different scaling factors used in the two 
models we examine. For CDMA one uses = s^/^/N while = s^/yK is used 
for the (ILP). Imposing the condition E 1 = F t leads to a relation between the struc- 
ture of the space of solutions, represented by i?', and the free parameter of the model 
a 2 . From Eqs. (fTTjl one obtains for the two models E t+1 = e^ 1 [a 2 + R l + e 2 (1 — N 1 )]^ 1 , 
F t+1 = ei [a 2 + e 2 (1 - N*)] (E t+1 ) 2 , where ei = 1 (J3) and e 2 = (3 (1) for the CDMA (ILP) 
system. This implies, after simplification, that for both cases R* = a 2 — a 2 . 



Despite the simplicity of this result, the process from which we obtained it provides a 
mechanism for estimating the true noise variance. In deriving E l and F l we used the fact 
that K,N — > oowith K/N = (3. So that the true noise variance <r 2 that appears in the 
expression for F l has been obtained from a signal vector with an infinite number of entries 
y^. Thus lim^oo N 
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^2^ = i {Ufi) 2 = e2 + o"q . Using this we can express the message as 
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(12) 



where no prior belief of the noise level o"o is required. 

The steady state equation for the macroscopic variable E l is obtained in the limit t — > oo, 
leading to the definition of E = lim^oo E*. In this regime the following relation holds 

-l 



E{al(3) = e7/ 1 {a 2 + e 2 
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J Vu tanh 2 (Je (a 2 ,f3)u + E (a 2 , 0) 



(13) 



From these expressions one can calculate directly the error per bit rate 



'E(alf3) 



(14) 



III. NUMERICAL RESULTS 

The inference algorithm requires an iterative update of Eqs. (fTUj) and (|T2*|) until they 
converge to a reliable estimate of the signal. We emphasise again that there is no need for 
prior information on the noise level. The computational complexity of the algorithm, as it 
has been presented here, is of 0(NK 2 ) but can be reduced to be 0(K 2 ) in a similar way to 
the approach taken in |2|. 

To test the performance of our algorithm we carried out a set of experiments of CDMA 
signal detection under typical conditions. Error probability of the inferred signals has been 
calculated for a system load of (3 = 0.25, where the true noise level is a 2 = 0.25 and the 
estimated noise is cr 2 = 0.01, as shown in Figure ^a). The solid line represents the expected 
theoretical results (density evolution), knowing the exact values of a 2 and a 2 , while circles 
represent simulation results obtained via the suggested practical algorithm, where no such 
knowledge is assumed. The results presented are based on 10 5 trials per point and a system 
size A = 2000, and are superior to those obtained using the original algorithm of Ref. 
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FIG. 1: (a) Error probability of the inferred solution as a function of time. The system load 
/3 = 0.25, true and estimated noise levels <Jq = 0.25 and <r 2 = 0.01, respectively. Squares represent 
results obtained by the algorithm of Q|, solid line the dynamics obtained from our equations; circles 
represent results obtained from the suggested practical algorithm. Variances are smaller than the 
symbol size, (b) The measure of convergence D of the obtained solutions, as a function of time; 
symbols are as in (a). 

Another performance measure to be consider is D t = K~ 1 (m t — m i_1 ) ■ (m* — m^ 1 ) , that 
provides an indication to the stability of the solutions obtained. In Fig. E^b) we compare 
results obtained from our algorithm, that exhibit fast convergence to a reliable solution, in 
stark contrast to the original algorithm !2| which does not converge. 

For the ILP, the K > N regime is highly interesting as the system develops a critical 
behaviour for a range of (cTq) values. We carried out a set of experiments for this system 
(the CDMA scaling was kept for consistency) based on density evolution. In Fig. Efa) 
we present curves of denned in Eq. (114)1 . as a function of the inverse load f3~ l for 
different values of Cq. Three different regimes have been observed: For ctq < 0.15 the curves 
exhibit a discontinuity at a value of /3 that varies with <y\ (first order phase transition-like 
behaviour). At (Xo = 0.15 the curve becomes continuous but its slope diverges (second order 
phase transition-like behaviour). The curves show analytical behaviour for noise values 
above 0.15. In Fig. |2h we present a phase diagram of the CDMA system. It shows the 
dependency of the critical load f3^} as a function of the noise parameter. The first order line 
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FIG. 2: (a) at the steady state, Eq. (|14|) . as a function of for different values of the noise 
parameter. For values of <Tq below 0.15 the curves show discontinuity at certain /? values, which 
becomes continuous but non-analytic at <7q = 0.15 around ~ 0.68. For noise variance values 
above Oq = 0.15 the curves become analytical, (b) Position of the non analyticity of the error rate 
curve (5q 1 as a function of the noise parameter <Tq. This first order phase transition-like curve ends 
in a second order phase transition-like point marked by o. 

ends in a second order transition point marked by a circle. 

Another indication for the critical behaviour is the number of steps required for the 
recursive update of Eq. (jlHj) to convergence. In Fig. EJa) we present the number of iterations 
needed to reach a steady state as a function of when the noise parameter is set to 
(Tq = 0.10. The number of iterations diverge when the critical value of (3 is reached. 

Finally, we wish to explore the efficiency of the algorithm as a function of the system size. 
In Fig.Efb) we present the result of iterating Eqs. (fTUj) and ()12J) for system sizes of K =200, 
400, 800, 1600 and 3200. The curves represent mean values over 1000 experiments. There 
is a strong dependency of the error per bit rate on the size of the system, which is expected 
to converge to the asymptotic limit (infinite system size) represented by the solid line. 
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FIG. 3: (a) Number of iterations of Eq. (|13|) required for convergence as a function of (3, for 
<Tq = 0.10. the error rate curve exhibits a discontinuity, (b) Finite size effects observed in the error 
rate curve when the Eqs. (|10j) and (|12fl are iterated over the number of steps needed to reach the 
steady state. The noise level used is Oq = 0.10 with Kq = 50. The curves are mean values over 
1000 experiments. The curve obtained from the iteration of the steady state equations is presented 
as a reference. 

IV. CONCLUSIONS 

In summary, we employed a new variational algorithm based on replicated variable sys- 
tems to investigate two related problems: signal detection in CDMA and learning in the ILP. 
The new algorithm facilitates the use of message passing techniques in densely connected 
systems, even in systems that show a fragmented solution space and represents an extension 
of existing algorithms similar to the extension of BP to SP. 

Results on the CDMA signal detection problem are superior than other existing algo- 

nn 

rithms |2L 1161] . without using any prior for the expected noise level. 

Results have also been obtained for low and intermediate load levels under various noise 
conditions, which are of higher relevance to ILP learning than to CDMA. These exhibit a 
first-order like transition for critical load levels and below a certain noise level (o"q < 0.15), 
that become second order as the noise level increases (at <Tq = 0.15). No transition points 
have been identified above this noise level. Finally, we also examined finite size effects in 
the system, which are clearly present even at a system size of 3200 nodes. 
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We are in the process of examining the suitability of the method for other applications [15] . 
While the approach seems promising, there is clearly a need for further research to fully 
determine the potential of the new algorithm. 
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